Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译
由于表现出不公平行为,基于深度学习的面部识别系统经历了增加的媒体关注。大型企业,如IBM,后果关闭了他们的面部识别和年龄预测系统。年龄预测是一个特别困难的应用程序,其公平仍然存在开放的研究问题(例如,预测不同种族的年龄同样准确)。年龄预测方法中不公平行为的主要原因之一在于培训数据的分配和多样性。在这项工作中,我们提出了两种用于数据集策策和数据增强的新方法,以通过平衡特征策策来提高公平,并通过分布意识增强增加多样性。为此,我们向面部识别域引入分发检测,用于选择与年龄,种族和性别之间的数据之间与深度神经网络(DNN)任务最相关的数据。我们的方法显示了有希望的结果。我们经过最佳训练的DNN模型在公平程度上表现优于4.92倍,并提高了DNN概括了亚马逊AWS和微软澳大利亚公共云系统的能力,分别将占据了31.88%和10.95%。
translated by 谷歌翻译
Late-life depression (LLD) is a highly prevalent mood disorder occurring in older adults and is frequently accompanied by cognitive impairment (CI). Studies have shown that LLD may increase the risk of Alzheimer's disease (AD). However, the heterogeneity of presentation of geriatric depression suggests that multiple biological mechanisms may underlie it. Current biological research on LLD progression incorporates machine learning that combines neuroimaging data with clinical observations. There are few studies on incident cognitive diagnostic outcomes in LLD based on structural MRI (sMRI). In this paper, we describe the development of a hybrid representation learning (HRL) framework for predicting cognitive diagnosis over 5 years based on T1-weighted sMRI data. Specifically, we first extract prediction-oriented MRI features via a deep neural network, and then integrate them with handcrafted MRI features via a Transformer encoder for cognitive diagnosis prediction. Two tasks are investigated in this work, including (1) identifying cognitively normal subjects with LLD and never-depressed older healthy subjects, and (2) identifying LLD subjects who developed CI (or even AD) and those who stayed cognitively normal over five years. To the best of our knowledge, this is among the first attempts to study the complex heterogeneous progression of LLD based on task-oriented and handcrafted MRI features. We validate the proposed HRL on 294 subjects with T1-weighted MRIs from two clinically harmonized studies. Experimental results suggest that the HRL outperforms several classical machine learning and state-of-the-art deep learning methods in LLD identification and prediction tasks.
translated by 谷歌翻译
Training of neural networks is a computationally intensive task. The significance of understanding and modeling the training dynamics is growing as increasingly larger networks are being trained. We propose in this work a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. We refer to our algorithm as \emph{correlation mode decomposition} (CMD). It splits the parameter space into groups of parameters (modes) which behave in a highly correlated manner through the epochs. We achieve a remarkable dimensionality reduction with this approach, where networks like ResNet-18, transformers and GANs, containing millions of parameters, can be modeled well using just a few modes. We observe each typical time profile of a mode is spread throughout the network in all layers. Moreover, our model induces regularization which yields better generalization capacity on the test set. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
In the last years, there has been a great interest in machine-learning-based heuristics for solving NP-hard combinatorial optimization problems. The developed methods have shown potential on many optimization problems. In this paper, we present a learned heuristic for the reoptimization of a problem after a minor change in its data. We focus on the case of the capacited vehicle routing problem with static clients (i.e., same client locations) and changed demands. Given the edges of an original solution, the goal is to predict and fix the ones that have a high chance of remaining in an optimal solution after a change of client demands. This partial prediction of the solution reduces the complexity of the problem and speeds up its resolution, while yielding a good quality solution. The proposed approach resulted in solutions with an optimality gap ranging from 0\% to 1.7\% on different benchmark instances within a reasonable computing time.
translated by 谷歌翻译
Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target domain. However, this requires laborious model-tuning by the end-user who may prefer to have a system that works ``out-of-the-box." To address such practical scenarios, we identify a novel target domain (inference-time) VAD task where no target domain training data are available. To this end, we propose a new `Zero-shot Cross-domain Video Anomaly Detection (zxvad)' framework that includes a future-frame prediction generative model setup. Different from prior future-frame prediction models, our model uses a novel Normalcy Classifier module to learn the features of normal event videos by learning how such features are different ``relatively" to features in pseudo-abnormal examples. A novel Untrained Convolutional Neural Network based Anomaly Synthesis module crafts these pseudo-abnormal examples by adding foreign objects in normal video frames with no extra training cost. With our novel relative normalcy feature learning strategy, zxvad generalizes and learns to distinguish between normal and abnormal frames in a new target domain without adaptation during inference. Through evaluations on common datasets, we show that zxvad outperforms the state-of-the-art (SOTA), regardless of whether task-relevant (i.e., VAD) source training data are available or not. Lastly, zxvad also beats the SOTA methods in inference-time efficiency metrics including the model size, total parameters, GPU energy consumption, and GMACs.
translated by 谷歌翻译
Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance.
translated by 谷歌翻译
Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Some existing approaches extract deep features from pre-trained networks and build a graph to apply classical clustering methods (e.g., $k$-means and normalized-cuts) as a post-processing stage. These techniques reduce the high-dimensional information encoded in the features to pair-wise scalar affinities. In this work, we replace classical clustering algorithms with a lightweight Graph Neural Network (GNN) trained to achieve the same clustering objective function. However, in contrast to existing approaches, we feed the GNN not only the pair-wise affinities between local image features but also the raw features themselves. Maintaining this connection between the raw feature and the clustering goal allows to perform part semantic segmentation implicitly, without requiring additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training our image segmentation GNN. Additionally, we use the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters ($k$-less clustering). We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.
translated by 谷歌翻译
In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.
translated by 谷歌翻译